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PROBLEMS  OF  ASSOCL\TION  FOR  BIVARIATE 
CIRCULAR  DATA  AND  A NF.V  TEST  OF  INDEPENDENCE 

BY 

1 2 
Madan  L.  Pun  and  J.  S,  Rao 

Indiana  University,  Bloomington 

I . Introduction  and  Svimmary 

St  ‘cistical  data  where  the  observations  are  directions  in 
either  two  or  :hree  dimensions  occur  naturally  in  many  diverse 
fields  such  as  geology,  biology,  astronomy  and  medicine  among 
others.  These  directions  may  be  represented  as  unit  vectors, 
that  is,  as  points  on  the  circumference  of  the  unit  circle 
if  two  dimensional  or  on  the-,  iunit  ' sphere  in  case  they 
are  three  dimensional.  These  directions  may  also  be  represented 
in  terms  of  angles  with  respect  to  a fixed  "zero  direction."  It 
is  natural  to  require  that  statistical  techniques  for  such  dat-a 
have  to  be  independent  of  this  arbitrarily  chosen  zero  direction, 
as  well  as  the  sense  of  rotation,  that  is,  whether  one  takes  clock- 
wise or  anticlockwise  as  the  positive  direction.  These  natural 
restrictions  rule  out  the  application  of  most  of  the  standard 
statistical  methods  for  directional  data.  For  instance,  even 
the  usual  "Arithmetic  Mean"  and  the  "Standard  Deviation"  fail  to 
be  meaningful  measures  of  location  and  dispersion.  This  novel 
area  of  statistics  has  been  receiving  increasing  attention  only 
recently  and  most  of  the  statistical  developm.ents  in  this  area 
have  occurred  during  the  past  two  decades,  especially  after  thn 
appearance  of  a paper  by  Fisher  (1953).  For  a general  survev  of 
this  field,  the  reader  is  referred  to  Mardia  (1972)  and  Batscholct 
(1965) . 


1 Work  supported  by  the  Air  Force  Office  of  Scientific  Research, 
AFSC,  USAF,  under  Grant  No.  AFOSR  76-2927.  Reproduction  Ln  whe' 
or  in  part  permitted  for  any  purpose  of  the  U.S.  Govomment. 

2 Present  address:  University  of  California,  Santa  Barh-'’'-’ 
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In  this  paper  we  shall  restrict  our  attention  to  circular  data,  that  is, 
data  on  two  dimensional  directions.  In  many  instances,  we  may  measure  more 
than  one  direction  corresponding  to  each  "individual"  in  the  population  either 
because  we  wish  to  gain  more  Information  than  what  a single  measurement 
could  give  or  because  we  wish  to  study  the  interrelationships,  if  any,  (for 
example,  correlation  and  regression)  between  such  variates.  Also,  one  might 
be  interested  in  regression  problems  like  predicting  the  paleocurrent  direction 
on  the  basis  of  cross-bedding  dip  directions  or  pebble  orientations,  or  in 
problems  of  correlation  like  measuring  the  association  between  the  flight  directions 
of  pigeons  and  the  prevalent  wind  direction.  This  important  area  of  multivariate 
directional  data  analysis  has  not  so  far  received  much  attention,  with  the  re- 
sult that  the  research  work  (on  multivariate  situations)  has  been  very  limited. 

To  acquaint  the  reader  with  some  background  material,  we  give  in  section  two, 
a brief  review  of  some  of  the  literature  on  association  and  independence  for  bi- 
variate circular  data.  In  section  three,  we  introduce  a new  test  of  independence 
for  measurements  on  a torus,  that  is,  bivariate  circular  data.  This  test  is 

especially  relevant  when  dealing  with  axial  data  le. , observations  from 
a circular  distribution  which  has  antipodal  symmetry.  The  asymptotic 
distribution  theory  of  the  proposed  te.st  statistic  is  derived  in  the  last  section 

II.  Problems  of  Association  and  Independence 

As  stated  in  section  one,  we  give  here  a brief  surv'ey  of  the  papers  of 
Downs  (1974),  Mardia  (197  5),  Gould  ( 1 969)  and  Rothman  ( 1 97 1 ),  all  of  which 
deal  with  problems  of  association  for  angular  variables.  While  the  first  two 
of  these  papers  attempt  to  define  a measure  of  correlation  for  angular  variates, 
the  third  discusses  problems  of  regression,  and  the  last  Introduces  a test  of 
independence.  In  what  follows,  (Xj,Yj),  . . .,  (X^,Y^)  will  denote  a random 
js.imple  of  observations  on  a torus,  lhat  is  X.  as  well  as  Yj  are  angles  in[  o,  Zv). 
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Downs  (1974)  defines  a measure  of  "rotational  correlation"  for  circular 
data  which  is  in  many  ways  analogous  to  the  product  moment  correlation  on  the 
line.  Let 


, n _ _i 

C = n“^  2 cos  X.  , = n 2 sin  X , 

X i = i i ^ i = l ‘ 


-1 


n 


C„  = n 2 cos  , 


-1 


i=l 


S„  = n " S sinY^  , 


i=  1 


be  the  arithmetic  means  of  cosine  and  sine  components  of  X and  Y. 
Let 


1 


n 


cos  Y^  - Cy  \ 


T = - 2 

" i = l \ sin  Y^  - S^j 


cos  X.  - C 

X X 


ACCC^SIOR  fK 


sin  X.  - S 

X ^ 


1TIS  WIt'V  S 

^ Uitm 

’•’-sKsousi'r) 
in'rICAIiC*  


Set 


4 


Sy  = 1 - (Cj  + Sy^), 


and 


= 1t(T’  T)  2 


tr(T'T) 


I T\  2 


where  (T'  T)^  is  the  unique  symmetric  positive  definite  matrix  whose  square 


is  (T' T)  and  (T' T)“^  is  its  inverse.  Note  that  the  determinant  1 T(T*  T) 


is  + 1 because  of  the  orthogonality  of  the  matrix.  Also  if  and  R denote 


the  squared  lengths  of  the  resultants  correponding  to  the  directions  (Xj,  •••,><„) 
and  (Yj,  . . . , Y^^)  respectively,  then  it  is  easily  seen  that 


‘ ■ '*v/n2 


which  are  the  commonly  used  measures  of  variation.  A Justification  for  the 


rather  complex  definition  of  S is  not  hard  to  find  and  the  reader  is  referred  to 

xy 


Ji 


□ □ 


Downs  (1974).  Downs  defines  the  circular  rotational  correlation 

V = S /S  • S . It  can  be  seen  that  v lies  between  -1  and  +1 
’’c  xy  X y 'c 

attaining  one  of  the  extreme  values  only  when  the  Y - deviation  (from  its 
resultant  direction)  is  a constant  multiple  of  an  orthogonal  transformation 
(that  is,  a rotation)  of  the  X- deviation  for  every  pair  (X^,  Y^)  in  the  sample. 
One  has  to  keep  in  mind  however  that  this  may  not  be  an  appropriate  measure 
of  association  when  the  correlation  is  not  strictly  rotational.  remains  in- 

variant if  the  origins  are  changed  for  the  X and  Y measurements.  The  samp- 
ling distribution  of  has  not  been  investigated  so  far  and  this  limits  the  use 

of  Y for  statistical  inference. 

’c 

Mardia's  (1975)  correlation  coefficient  for  circular  data  is  based  on 
the  ranks  of  the  observations,  and  is  defined  as  follows;  In  analogy  with  the 
linear  situation,  a perfect  correlation  is  said  to  exist  between  X and  Y if  the 
whole  probability  mass  is  concentrated  on 

(IX  + mY  + Z)  mod  2n  = 0 

for  some  positive  integers  ^ m and  a fixed  angular  quantity  Z.  Define 
X*  = /X^(mod  2 it),  Y*  = mYj(mod  2tt),  and  let  the  linear  ranks  of  Xj,  . . . , X^ 
be  1,  . . . , n respectively  and  those  of  Y,  , , . . , Y be  r,,  . . . , r respectively. 

♦ * ?t1 

The  angles  (X^ , Y^  ) are  then  replaced  by  the  uniform  scores  ^ • 

2 2 

Now  let  Rj  and  denote  the  lengths  of  the  resultants  corresponding  to  the 
directions  2ir(i-r.)/n,  i = 1,  . . . , n and  2r(i  + r.) /n,  1 = 1,  . . . , n respectively. 
Mardia  (1  975)  defines 

,^2,2  d2/  2. 

Yq  = max(Rj  /n  , R.,  / n ) 

as  the  circular  rank  correlation  coefficient.  It  is  clear  that  Yq  Hcs  between 
zero  and  one,  and  that  it  remains  Invariant  under  changes  of  zero-directions 


r 


2 2 

of  X and  Y.  We  have  / n = 1 for  perfect  "positive"  dependence,  and 
2 2 

R2/ =1  for  perfect  "negative"  dependence,  will  be  close  to  zero  if 
X and  y are  uncorrelated.  Mardla  (1975)  also  discusses  the  asymptotic  null 
distribution  of  Yq  and  gives  a table  of  critical  values. 

Gould  (1969),  on  the  other  hand,  considers  an  analogue  of  the  normal 
theory  linear  regression  for  circular  variables.  Let  (x^,  Y^),  i = 1,  . . . , n be 
observations  on  directions  such  that  Y.,  i = 1,  . . . , n are  independently  dis- 
tributed as  circular  normal  random  variables  CN  (a  + (3Xj  , that  is,  with 
density 


cos(y^  - o - pXj) 


where  x^,  . . . , x^  are  known  numbers  while  a,  p,  and  x are  unknown  parameterr 
Since  the  logarithm  of  the  likelihood  function  is 


-n  log  2 TT  - n log  1 (5T)  + jr  z:  cos  (Y  - 0 - p x ) , 

i=l  ‘ ‘ 


the  MLE's  (Maximum  Likelihood  Estimates)  o and  "p  of  a and  p are  the 
solutions  of  the  equations 


r sln(Y,  - a - pxj  = 0 
1=1  * ^ 


£ X sln(Y,  - o - px.)  = 0 
1 =1  ^ ^ ^ 


These  solutions  can  be  obtained  by  an  Iterative  procedure  discussed  in 
Gould  (1969).  After  solving  for  u and  'p,  the  MLE  x oi  X is  obtained  from 
the  equation 


w:j 
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" A A 

S cos(Y,  - a - \i  X.)  / n . 
i=l  ^ ‘ 

Finally,  for  testing  Independence,  Rothman  ( 1971)  assumes  that  the 

observations  (X,.Y,) (X  , Y ) are  from  a continuous  d.  f.  (distribution 

PI’  ’ n’  n 

function)  F(x,  y)  whose  marginal  d.f. 'sare  F^(x)  and  F^(y).  The  problem  is 
to  test  the  hypothesis  F(x,  y)  = F^(x)  • F^lylV  (x,  y).  With  respect  to 

the  given  origins  for  the  two  variates,  let 

-1 

F , (x)  = n 2 I(X  ; x), 
i =1  ^ 

F^^(y)  = n”^  2 I(Y  ; y)  , 

1=1  ^ 

-1  " 

Fjx,  y)  = n 2 I(X  ;x)  I(Y  ;y), 

^ 1 = 1 ^ ^ 


Ij(^) 


I 


1 

« 

I 


where 

I(s;t)  = j 1 if  s < t, 

I 0 if  s > t. 

Thus  are  the  empirical  d.  f. 's  of  the 

X's,  Y's  and  (X,  Y)'s  respectively.  Distribution  free  tests  of  the  form 

J J T„(x,y)  dF^(x,  y),  where  Tj^(x,y)  = [F^(x,  y)  - F^^(x)  F^^^^y)  ] 

were  considered  earlier  by  Blum,  Kiefer  and  Rosenblatt  (1961)  and  since  they 
are  net  Invariant  under  different  choices  of  the  origins  for  X and  Y,  they  arc 
not  applicable  to  the  circular  case.  To  circumvent  this  problem,  Rothman  (1*^71) 
suggested  the  modified  statistic 


i 


\ 

I 


I 


1 

j 


1 

i 

« 

1 

1 

j 


K 

e. 


1 

1 

« 

1 

i 

1 
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C„  = "/  / Znlx.y)  ■IFn*’'-'''- 

" 0 0 

where 

Z„u,y)  = [yx,y)  + // I^(x,y|dF^(x,y)  - / TJx.y)  dF^|(x) 

-/T„(x,y)dF_,2(y)l 

The  statistic  C has  the  desired  invariance  property  and  its 
n 

asymptotic  distribution  theory  under  the  null  hypothesis  has  been  investigated 
by  Rothman  (1971). 

III.  A new  test  for  co-ordinate  independence  of  circular  data. 

Let  Jy-  denote  the  family  of  probability  distributions  on  the  circum- 
ference [o,  2tt)  of  the  circle  with  the  property  F(q  + 2tt)  = F(a)  + ^ for  all  o. 

For  instance  circular  distributions  with  axial  symmetry  would  be  in  this  class. 

In  this  section  we  propose  a test  which  is  applicable  to  testing  independence 
when  the  marginals  F^  and  F^  belong  to  this  class  . When  dealing  with 
axial  data  each  observed  axis  will  be  represented  by  both  its  antipodal  points 
for  the  purposes  of  this  test.  The  proposed  test  may  also  be  applied  for  testing 
Independence  in  the  non-axial  case.  But  in  this  case,  corresponding  to  every 
observed  direction,  we  should  add  its  antipodal  point  also  to  the  data  tlius 
doubling  the  original  sample  size.  The  effect  such  "doubling"  would  have  on  the 
power  of  the  test  procedure  , is  prescjntly  being  investigated.  Thus  from  now  on, 
the  random  sample  (XjY^),  . . ■ (X^,  Y^)  referred  to  in  sections  III  and  IV  corresponds 
to  the  axial  data  with  both  ends  represented  as  two  distinct  sample  points  or  the 
"doubled"  sample  in  the  general  non-axlal  case.  This  ensures  that  the  marginal 
distributions  F^  and  F^  belong  to  the  class  J . 

As  before,  let  (X^,  Yj),  . . . , (X^,  Y^)  denote  a random  sample  of  angular 
variates  on  the  basis  of  which  we  wish  to  test  the  null  hypothesis  of  independ- 
ence. For  any  fixed  x In  [o,  2ti),  let  N,(x)  denote  the  number  of  x; s that 


w 
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fall  in  the  half-circle  [x,  x+tt).  Similarly,  for  any  fixed  y in  [o,  2tt),  let  Np(y) 
denote  the  number  of  Y^'s  which  fall  in  the  half  circle  [y,  y + it).  Also,  let 
N(x,  y)  denote  the  number  of  observations  (X^,  Y^)  that  fall  in  the  quadrant 
[x,  x + Ti)  X [y,  y +tt).  Now  defining  the  indicator  variables 


(3.1) 

and 

(3.2) 

we  obtain 
(3.  3) 


Ij(x)  = 


Ij(y)  = 


1 if  € [x,  X + tt), 

0 otherwise 

1 ifYjf[y,y  + n) 

lo  otherwise 


N,(x)  = E I.(x),  N (y)  = S I (y),  and 
1 1 = 1 * i = l 


N(x,  y)  = S I.(x)  I (y) 
i=l 


If  the  hypothesis  of  independence  holds,  then  we  should  have 
N(x  y)  = N,(x)  • N,(y)/n  by  the  usual  orguments. 

Thus  we  define 

(3.4)  • D^(x,  y)  = n"MN(x,y)  - Nj(x)  N^(y)/n] 


as  a measure  of  discrepancy  between  the  observed  and  expected  (under  the 
hypothesis  of  independence)  frequencies.  Since  T^(x,  y)  depends  specifically 
on  the  choices  of  x and  y,  we  suggest  the  (invariant)  statistic 


(3.  5) 


T 


n 


2ir  2tt 

L L 

0 0 


dx  dy 
Zv  Zv 


Zir  2it 

J f [N(x,y) 

4it  n 0 0 


N,(x)  NJy)  2 
— ‘ ^ ^ ] dx  dy 


n 


fortesting  independence.  The  integrand  D^^(x,  y)  is  much  like  the  usual 

chi-square  test  for  independence  from  a 2X2  table.  We  now  derive  a 

computional  form  for  in  terms  of  the  X and  Y spacings. 

In  view  of  (3.  1),  (3.  2)  and  (3.  3),  we  have 

n _ -1  2 

nD^(x,y)=  [(1-  ^x)  T,(y))  - ^ Ii(x)I,(y))J 


..I 


= (1--) 


iihkl 


(1  <s  I.I..  + r I.. I.  + z I 

n n ' 1 1 13  i]  1 1) 


where  = I.(x)  I^(x),  I..  = Ij(y)  Ij(y),  and  summations  are  over  all  distinct 
subscripts.  It  is  easy  to  check  that 


2 TT  2 it  _ 

f I (x)dx  = ( I,(y)dy  = IT 
0.  "O  ^ 


fo  dx  = TT  - , 


\ li^y)  ij^y)  dy  = ^ ~ » 


where  and  ar  e the  "circular"  distances  between  (X^,Xj)  and 

(Y  Y ) respectively.  Omitting  the  routine  computations,  we  obtain 


{2{Tr-D^j)  -I- 

1 2 — 

+ (!--)  {E(tt  - D.p(TT  - DjP  } 

- {E(n-Dy)(.  -D.,)) 

+ L {s(,-D„)(,-^j))]  , 

where  again  the  summations  run  over  all  the  distinct  subscripts.  The  statistic 
being  a function  of  the  circular  distances  ^ ^^ij^  ’ is  clearlv' 

invariant  under  rotations  of  either  coordinate  axis. 


To  derive  the  asymptotic  null  distribution  of  T^,  we  will  utilize  the 
methods  of  Fourier  analysis  similar  to  those  in  Rao  (1972)  and  Rothman  (1971). 


Since  D^(x,  y)  is  a doubly  periodic  function  (in  both  x and  y)  we  may  find 
the  Fourier  expansion 


D^(x,  y) 


S 

k=-  00 
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From  the  definitions  of  L (x)  and  I^(y)  , it  follows  that 


f 


Z,  = < 
km 


0,  if  k or  m is  even 


n 


TT^mk 


V 


Z e^>^j 
J=1 


j-1 


im  Y. 
e j 


Z e‘  j 

3=1 


ikX,  + imY 


if  both  k and  in  are  odd. 


Thus  from  (3.5),  (4.1)  and  an  application  of  Parseval's  theorem,  we 
have 


km 


It  can  be  verified  that  under  the  null  hypothesis  of  independence 


and 


EZ,  =0  V k,  m 
km 


EZ,  Z,  . , 
km  k m 


— !sJi^ !Mll  / if  both  k and  m arc  odd 

4,  Z 2 I 2 / ’ 

IT  k m ' n ' 

0,  otherwise 


where  6 is  the  usual  kronecker  delta.  Thus,  the  random  Fourier  co 
jk 

efficients  Zj^^  have  zero  means  and  are  uncorrelated  for  distinct  pairs  (k,  ni). 

It  may  be  remarked  hero  that  the  above  expectations  may  be  calculated  under 
the  assumption  that  the  X.'s  andY^'s  have  a uniform  distribution,  which 
under  the  null  hypothesis  are  further  independent,  in  view  of  the  fact  we 
could  use  a probability  integral  transformation  as  in  Blum,  Kiefer  and  Rosenblatt 
(1961).  Tt  is  clear  that  for  any  Ff  > the  probability  integral  transformation  a F(o) 
preserves  half  circles.  Hence  under  thc'  transformations  x - Fj(x),  y - F^(y) 
the  numbers  Nj(x),  and  N(x,  y)  remain  unchanged  as  does  the  statistic 
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Now  by  arguments  similar  to  those  in  Rao  (197  2)  or  Rothman  (1971),  we 

have  the  result  that  has  asym.ptotically  tho  same  distribution  as  the  sum  of 

the  squares  of  independent  normal  variables  with  zero  means  and  variances 

(ir'^k^m^)"^  for  odd  k and  m.  Thus  the  asymptotic  characteristic  function 

of  T is  given  by 
n 


TT^  ir 

/ 2 it  \ 

II 

^ ~ 4 2 2 I 

k odd  m odd 

1 Ti  K.  m / 

- 1/2 


00  oo 

ir  r 

k=l  m=l  \ 


-2 


1 - 


2 ic 


Ti'^(2k-l)^(2m-l)^ 


This  characteristic  function  can  be  formally  inverted  as  in  Rao  (1972).  On  the 
other  hand  by  a result  of  Zolotarev  (1961),  if  " (>;)  denotes  the  asymptotic 

d f of  T then  the  upper  tail  probabilities  relating  to  T may  be  approximated 

■ ‘ n ’ “ 

as  follows: 


lim 

X-»co 

1 - F(x) 

00 

'1-  - ' -] 

P[X4  > 

.0: 

IT  ' 

m=l 

\ (2k-l)^  (2m-l)'  / 

(k,  m)  t (1,  1) 

where  X4  denotes  a random  variable  having  the  chi-square  distribution  v/ith 
4 degrees  of  freedom. 

The  authors  wish  to  thank  lYofessor  Christopher  Bingham  for  poinlinci 
out  the  somewhat  restrictive  nature  of  the  test  proposed  here. 
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